Lecture 1

Modeling spatio-temporal processes

Course Organization

Teachers: Edzer Pebesma / Christian Knoth

  • Course style: Lecture + Excercises
  • Lecture discusses the theory, excercises teach you how to work with and analyze data in practice (using R)
  • 5 ECTS (2 lecture, 3 excercises): 150 hours of work
  • Lectures: Tuesday 10-12 Room 130
  • Excercises: Wednesday 12-14 Room 130

Course Organization

Learnweb: MSTP-2015_2, password -> mstp1516

Slides: Weekly updates in Learnweb

Exam:

  • multiple choice, 4 possibilities, 40 questions, 20 need to be correct.

Communication:

  • Use learnweb forum to ask questions, student answers are very welcome (!)

Literature

  • Full script from 2023: http://edzer.github.io/mstp/
  • C. Chatfield, The analysis of time series: an introduction. Chapman and Hall: chapters 1, 2 and 3 [@chatfield]
  • R. Hyndman, G. Athanasopoulos: Foreasting: Principles and Practice
  • Spatial Data Science, with applications in R [@sdsr]:
    • Ch 1 (intro), 7 (sf, stars)
    • Ch 12 (interpolation)

Overview

Where we come from…

  • mathematics, linear algebra, computer science,
  • introduction to geostatistics
    • types of variables: Stevens’ measurement scales – nominal, ordinal, interval, ratio
    • or: discrete, continuous
    • t-tests, ANOVA
    • regression, multiple regression (but now how we compute it)
    • assumption was: observations are independent
    • what does independence mean?

Overview

In this course,

  • we will study dependence in observations, in space, time, or space-time
  • in space and/or time, Stevens’ measurement scales are not enough!
  • Examples:
    • linear time, cyclic time
    • space: functions, fields
  • we will study how we can represent phenomena, by
    • mathematical representations (models)
    • computer representations (models)
  • we will consider how well these models correspond to our observations

Topics

  • Time series models
    • (Partial) autocorrelations, ARIMA(p,d,q)
    • Model selection, AIC
    • Forecasting, decomposition
    • Case study: Land use change monitoring from satellite imagery
  • Optimisation
    • Linear models, least squares, normal equations
    • Non-linear:
      • One-dimensional: golden search
      • Multi-dimensional least squares: Newton
      • Multi-dimensional stochastic search: Metropolis
      • Multi-dimensional stochastic optimisation: Simulated annealing

Topics

  • Spatial models
    • Simple, heuristic spatial interpolation approaches
    • Spatial correlation
    • Regression with spatially correlated data
    • Kriging: best linear (unbiased) prediction
    • Stationarity, variogram
    • Kriging varieties: simple, ordinary, universal kriging
    • Kriging as a probabilistic spatial predictor
  • Spatio-temporal variation modelled by partial differential equations
  • Agent-based approaches

Spatio-temporal phenomena are everywhere

  • if we think about it, there are no data that can be non-spatial or non-temporal.
  • in many cases, the spatial or temporal references are not essential
    • think: brain image of a person: time matters, but mostly referenced with respect to the age of the person, spatial location of the MRI scanner does not
    • but: ID of the patient does!
    • and: time of scan matters too!
  • we will “pigeon-hole” (classify) phenomena into: fields, objects, aggregations

Fields

  • many processes can be represented by fields, meaning they could be measured everywhere
  • think: temperature in this room
  • typical problems: interpolation, patterns, trends, temporal development, forecasting?

Objects and events

  • objects can be identified
  • objects are identified within a frame (or window) of observation
  • within this window, between objects, there are no objects (no point of interpolation)
  • objects can be moving (people), or static (buildings)
  • objects or events are sometimes obtained by thresholding fields, think heat wave, earthquake, hurricane, [see e.g. @camara2014fields]
  • sometimes this view is rather artificial, think cars, persons, buildings

Fields - objects/events conversions

  • we can convert a field into an object by thresholding (wind field, storm or hurricane)
  • we can convert objects into a field e.g. by computing the density as a continuous function

Aggregations

Aims of modelling

… could be

  • curiousity
  • fun: studying models is easier than measuring the world around us

More scientific aims of modelling are

  • to learn about the world around us
  • to predict the past, current or future, in case where measurement is not feasible.

What is a model?

  • conceptual models, e.g. the water cycle (wikipedia:) the water cycle

the water cycle, updated
  • object models, such as UML (wikipedia:) UML
  • mathematical models, such as Navier Stokes’ equation, (wikipedia:) Navier Stokes equation

What is a mathematical model?

A mathematical model is an abstract model that uses mathematical language to describe the behaviour of a system, quoting [@eykhoff] a mathematical model is:

a representation of the essential aspects of an existing system (or a system to be constructed) which presents knowledge of that system in usable form

In the natural sciences, a model is always an approximation, a simplification of reality. If degree of approximation meets the required accuracy, the model is useful, or valid (of value). A validated model does not imply that the model is “true”; more than one model can be valid at the same time.

Time series models

we will first look into time series models, because they are + simple + easy to write down + well understood

time series models are roughly divided in

  1. time domain models and, which look at correlations and memory
  2. frequency domain models, which focus on periodicities

Spatial equivalents are mostly found in (a), although (b) has spatial equivalences as well (e.g. wavelets).

Some data

Consider the following process (\(\Delta t\) = 1 min):

[1] "meteo"
 [1] "ID"            "year"          "julian.day"    "time"         
 [5] "T.outside"     "pressure"      "humidity"      "X"            
 [9] "windspeed"     "std.dev."      "Wind.dir"      "std.dev..1"   
[13] "TippingBucket" "mins"          "hours"         "date"         
[17] "T.per"        

Some questions

  • how can we describe this process in statistical terms?
  • how can we model this process?
  • (how) can we predict future observations?

White noise

Perhaps the simplest time series model is white noise with mean \(m\):

\[y_t = m + e_t, \ \ e_t \sim N(0,\sigma^2)\]

\(N(0,\sigma^2)\) denoting the normal distribution with mean 0 and variance \(\sigma^2\), and \(\sim\) meaning distributed as or coming from.

\(t\) is the index \(t=1,2,...,n\) of the observation, and refers to specific times, which, when not otherwise specified are at regular intervals.

White noise

A white noise process is completely without memory: each observation is independent from its past or future. Plotting independent, standard normal values against their index (the default for plotting a vector in R) shows how a white noise time series would look like:

Autocorrelation

Autocorrelation (or lagged correlation) is the correlation between \(y_i\) and \(y_{i+h}\), as a function of the lag \(h\): \[ r(h) = \frac{\sum_{i=1}^{n-h}(y_i-\bar{y})(y_{i+h}-\bar{y})}{\sum_{i=1}^n (y_i-\bar{y})^2} \] with \(\bar{y} = \frac{1}{n} \sum_{i=1}^n y_i\)

Autocorrelation of white noise

We can look at the auto-correlation function of a white noise process, and find it is uncorrelated for any lag larger than 0:

Random walk

A simple, next model to look at is that of random walk, where each time step a change is made according to a white noise process: \[y_t = y_{t-1} + e_t\] Such a process has memory, and long-range correlation. If we take the first-order differences, \[y_t - y_{t-1} = e_t\] we obtain the white noise process.

Further, the variance of the process increases with increasing domain (i.e., it is non-stationary)

Example random walk:

We can compute it as the cumulative sum of standard normal deviates: \(y_n = \sum_{i=1}^n e_i\):

Code
#
# generate three series:
rw1 = cumsum(rnorm(5000))
rw2 = cumsum(rnorm(5000))
rw3 = cumsum(rnorm(5000))
plot(rw1, type='l', ylim = range(c(rw1,rw2,rw3)))
lines(rw2, type='l', col='red')
lines(rw3, type='l', col='blue')

Example random walk:

MA(1), MA(q)

Let \(e_t\) be a white noise process. A moving average process of order \(q\) is generated by \[y_t = \beta_0 e_t + \beta_1 e_{t-1} + ... + \beta_q e_{t-q}\]

Note that the \(\beta_j\) are weights, and could be \(\frac{1}{q+1}\) to obtain an unweighted average. Moving averaging smoothes the white noise series \(e_t\).

MA(1), MA(q)

Moving average over monthly CO2 measurements on Maunaloa:

MA(1), MA(q)

Moving averages over a white noise process:

Wider moving average filters give new processes with + less variation + stronger correlation, over larger lags

AR(1), AR(p)

An auto-regressive (1) model, or AR(1) model is generated by \[y_t = \phi_1 y_{t-1}+e_t\] and is sometimes called a Markov process. Given knowledge of \(y_{t-1}\), observations further back carry no information; more formally: \[\Pr(y_t|y_{t-1},y_{t-2},...,y_{t-q}) = \Pr(y_t|y_{t-1})\]

  • \(\phi_1 = 1\) gives random walk, \(\phi_1=0\) gives white noise.
  • AR(1) processes have correlations beyond lag 1
  • AR(1) processes have non-significant partial autocorrelations beyond lag 1

AR(1), AR(p)

As an example, we create (simulate) an AR(1) process with \(\phi_1=0.85\) and \(e\) drawn from the standard normal distribution (mean 0, variance 1).

Compare the shape of the acf with that obtained from a MA process: different!

AR(p)

\[y_t = \phi_1 y_{t-1}+ \phi_2 y_{t-2} + ... + \phi_p y_{t-p} + e_t\] or \[y_t = \sum_{j=1}^p \phi_j y_{t-j}+e_t\] Properties:

  • The state of \(y_t\) does not only depend on \(y_{t-1}\), but observations further back contain information
  • AR(p) have autocorrelations beyond lag p
  • AR(p) have “zero” partial autocorrelations beyond lag p

AR(p)

Partial correlation

  • Correlation between \(y_t\) and \(y_{t-2}\) is simply obtained by plotting both series of length \(n-2\), and computing correlation
  • Lag-2 partial autocorrelation of \(y_t\) and \(y_{t-2}\), given the value inbetween \(y_{t-1}\) is obtained by
  • computing residuals \(\hat{e}_t\) from regressing of \(y_t\) on \(y_{t-1}\)
  • computing residuals \(\hat{e}_{t-2}\) from regressing of \(y_{t-2}\) on \(y_{t-1}\)
  • computing the correlation between both residual series \(\hat{e}_t\) and \(\hat{e}_{t-2}\).
  • Lag-3 partial autocorrelation regresses \(y_t\) and \(y_{t-3}\) on both intermediate values \(y_{t-1}\) and \(y_{t-2}\)
  • etc.

Partial correlation

Partial correlation can help reveal what the order of an MA(q) or AR(p) series is:

Relation between AR and MA processes

Chatfield [@chatfield] has more details about this. Substitute the AR(1) as follows \[y_t = \phi_1 y_{t-1} + e_t\] \[y_t = \phi_1 (\phi_1 y_{t-2} + e_{t-1}) + e_t\] \[y_t = \phi_1^2 (\phi_1 y_{t-3} + e_{t-2}) + \phi_1 e_{t-1} + e_t\] etc. In the limit, we can write any AR process as an (infinite) MA process, and vice versa.